An Improved Data Stream Summary: The Count-Min Sketch and Its Applications

نویسندگان

  • Graham Cormode
  • S. Muthukrishnan
چکیده

We introduce a new sublinear space data structure—the count-min sketch—for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc. The time and space bounds we show for using the CM sketch to solve these problems significantly improve those previously known—typically from 1/ε2 to 1/ε in factor.  2003 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stream Clustering using Probabilistic Data Structures

Most density based stream clustering algorithms separate the clustering process into an online and offline component. Exact summarized statistics are being employed for defining micro-clusters or grid cells during the online stage followed by macro-clustering during the offline stage. This paper proposes a novel alternative to the traditional two phase stream clustering scheme, introducing sket...

متن کامل

Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing

Approximate stream processing algorithms, such as Count-Min sketch, Space-Saving, etc., support numerous applications in databases, storage systems, networking, and other domains. However, the unbalanced distribution in real data streams poses great challenges to existing algorithms. To enhance these algorithms, we propose a meta-framework, called Cold Filter (CF), that enables faster and more ...

متن کامل

Sketching Techniques for Large Scale NLP

In this paper, we address the challenges posed by large amounts of text data by exploiting the power of hashing in the context of streaming data. We explore sketch techniques, especially the CountMin Sketch, which approximates the frequency of a word pair in the corpus without explicitly storing the word pairs themselves. We use the idea of a conservative update with the Count-Min Sketch to red...

متن کامل

Fast Approximate Wavelet Tracking on Streams

Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for general-purpose summaries of the streaming data distribution (e.g., wavelets). The focus of existing work has typically been on minimizing space requirements of the maintained synopsis — however, to effectiv...

متن کامل

236779 : Foundations of Algorithms for Massive Datasets Nov 11 2015 Lecture

These notes cover the end of the Frequent-items (Batch-Decrement) sketch, the Count-Min sketch, the F2 Tug-of-War sketch (AMS), and initial background for dimensionality reduction and the Johnson-Lindenstrauss transform. 1 Reminder: Frequency Moments We are given a stream (sequence) of N characters (or items) a1, a2, . . . , aN from a large alphabet Σ of size |Σ| = n. Definition 1. A histogram ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004